Skip to content

Address external review: fix release-blocking bin bug + close permission-drift blind spots#49

Merged
Conalh merged 3 commits into
mainfrom
fix/cli-bin-symlink-and-codex-baseline-fp
May 29, 2026
Merged

Address external review: fix release-blocking bin bug + close permission-drift blind spots#49
Conalh merged 3 commits into
mainfrom
fix/cli-bin-symlink-and-codex-baseline-fp

Conversation

@Conalh
Copy link
Copy Markdown
Owner

@Conalh Conalh commented May 29, 2026

Implements the full external review of ScopeTrail across two commits.

Trust-critical pair (first commit)

  • Add GitHub Action diff mode #1 — Silent-pass bin bug. import.meta.url === process.argv[1] was false when the scopetrail bin was launched through an npm symlink (npx/global install), so main() never ran and the CLI exited 0 with no output — a CI gate silently passing every PR. Now resolves both sides through realpathSync. Added a test that runs the built bin through a symlink (existing CLI tests only ran node dist/index.js directly).
  • Add line-level GitHub annotations #6 — Codex baseline false positives. A missing base value ranked -1, so a brand-new .codex/config.toml at the narrowest posture (read-only sandbox, untrusted approval) was reported as a high-severity "widening." The baseline now anchors at Codex's safe default; genuinely-wide new configs (danger-full-access, never) still fire.

Detector blind spots (second commit)

  • Add traction plan and feedback intake #4 Merge all recognized MCP server maps — an empty mcpServers: {} no longer shadows a populated servers: {}.
  • Demo: risky agent permission drift #3 Diff sensitive MCP fields (env, headers, cwd) on existing servers. A server keeping its command but gaining a secret env var or auth header now surfaces; secret-bearing key names escalate to high. Covers .mcp.json and Codex [mcp_servers.NAME]. Removals stay silent (narrowing). New kinds: mcp_server_sensitive_field_changed, codex_mcp_sensitive_field_changed.
  • Add public CI and Action metadata #5 Model Claude hook entries by (matcher, type, command), so rebinding a guard from one tool to another (same command) is caught. ⚠️ Behavior reversal: this overturns the prior "matcher change is noise" decision and its test. Rationale: a PreToolUse guard moved from Bash to Read stops guarding Bash — a real enforcement-surface change. Flagging if you'd rather keep the old behavior.
  • Update checkout action to v6 #7 Expand the critical removed-deny patterns beyond .env/secret/credential/.pem to SSH keys, *.key, cloud credentials, registry tokens, and kube configs.

Docs & supply-chain

Verification

  • 76/76 tests pass (Node --test); each finding ships a unit test.
  • Benchmark corpus grows to 35 cases (27 rogue, 8 benign) across 21 detector kinds; precision/recall hold at 100% / 0% FP, zero disagreements. RESULTS.md regenerated and README counts synced.
  • Committed dist/ rebuilt (CI gates on it).

🤖 Generated with Claude Code

Conalh and others added 3 commits May 29, 2026 09:38
Two trust-critical fixes flagged in external review.

1. npm installs the `scopetrail` bin as a symlink, so the entrypoint's
   `import.meta.url === process.argv[1]` main-module check was false when
   launched via `npx`/global install: main() never ran and the CLI exited
   0 with no output -- a CI gate that silently passed every PR. Resolve both
   sides through realpathSync. Add a test that runs the built bin through a
   symlink (the existing CLI tests only ran `node dist/index.js` directly).

2. Codex sandbox/approval drift ranked a missing base value at -1, so a
   brand-new .codex/config.toml that merely set the narrowest posture
   (read-only sandbox, untrusted approval) was reported as a high-severity
   widening/weakening -- a false positive on the safest possible config.
   Anchor the missing baseline at Codex's safe default so only settings
   genuinely wider than that default surface; a brand-new danger-full-access
   sandbox or `never` approval still fires.

Add unit tests for both Codex directions and a benign benchmark fixture
(codex-baseline-narrowest); regenerate RESULTS.md (29 cases, precision/recall
unchanged at 100%/0% FP) and sync the README corpus counts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements the rest of the external review (the trust-critical bin/Codex
pair landed in the previous commit).

Detector blind spots:
- #4 Merge all recognized MCP server maps instead of first-map-wins, so an
  empty `mcpServers: {}` can no longer shadow a populated `servers: {}` in
  Cursor/VS Code configs.
- #3 Diff sensitive MCP fields (env, headers, cwd) on existing servers, which
  serverCommand()-based comparison ignored. A server keeping the same command
  but gaining a secret env var or auth header now surfaces; secret-bearing key
  names escalate to high. Applies to .mcp.json and Codex [mcp_servers.NAME].
  Removals stay silent (narrowing, not widening). New kinds:
  scope_trail.mcp_server_sensitive_field_changed and
  scope_trail.codex_mcp_sensitive_field_changed.
- #5 Model Claude hook entries by (matcher, type, command) rather than command
  alone, so rebinding a guard from one tool to another (same command) is
  caught. NOTE: this reverses the prior "matcher change is noise" decision and
  its test — a PreToolUse guard moved from Bash to Read stops guarding Bash,
  which is a real enforcement-surface change.
- #7 Expand the critical removed-deny pattern set beyond .env/secret/
  credential/.pem to SSH keys, *.key, cloud credentials, registry tokens, and
  kube configs.

Docs & supply-chain:
- #2/#8 Qualify the README "local-only" claim: the scanner uploads nothing,
  but the Action's setup step runs `npm ci` from the registry.
- #9 Pin agent-gov-core to exact 1.3.0 (lockfile kept in sync) so npm
  consumers do not silently resolve a newer parser/schema.
- #10 Note that the benchmark figures bound regressions, not real-world field
  accuracy.

Each change ships unit tests and a labeled benchmark fixture. Corpus grows to
35 cases (27 rogue, 8 benign) across 21 detector kinds; precision/recall hold
at 100%/0% false-positive rate, RESULTS.md regenerated and README counts
synced.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
runDiff materialized the base snapshot, then the head snapshot, and only
assigned `cleanup` after both succeeded. If head materialization threw — an
unresolvable head ref (a shallow checkout missing it), or a max-buffer error
on an oversized config — the base snapshot's temp dir was already on disk and
leaked, since cleanup was never wired up. Clean the base snapshot explicitly
before the error propagates.

Found during a fresh correctness pass over the codebase (not part of the
external review). The existing unresolvable-ref test fails on the *base* ref,
before any dir exists, so it never exercised this path. Adds a regression test
that asserts no scopetrail-snapshot temp dir survives a head-side failure.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Conalh Conalh merged commit 5ba7e4e into main May 29, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant